12 research outputs found
How Hard is Counting Triangles in the Streaming Model
The problem of (approximately) counting the number of triangles in a graph is
one of the basic problems in graph theory. In this paper we study the problem
in the streaming model. We study the amount of memory required by a randomized
algorithm to solve this problem. In case the algorithm is allowed one pass over
the stream, we present a best possible lower bound of for graphs
with edges on vertices. If a constant number of passes is allowed,
we show a lower bound of , the number of triangles. We match,
in some sense, this lower bound with a 2-pass -memory algorithm
that solves the problem of distinguishing graphs with no triangles from graphs
with at least triangles. We present a new graph parameter -- the
triangle density, and conjecture that the space complexity of the triangles
problem is . We match this by a second algorithm that solves
the distinguishing problem using -memory
Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning
The number of triangles is a computationally expensive graph statistic which
is frequently used in complex network analysis (e.g., transitivity ratio), in
various random graph models (e.g., exponential random graph model) and in
important real world applications such as spam detection, uncovering of the
hidden thematic structure of the Web and link recommendation. Counting
triangles in graphs with millions and billions of edges requires algorithms
which run fast, use small amount of space, provide accurate estimates of the
number of triangles and preferably are parallelizable.
In this paper we present an efficient triangle counting algorithm which can
be adapted to the semistreaming model. The key idea of our algorithm is to
combine the sampling algorithm of Tsourakakis et al. and the partitioning of
the set of vertices into a high degree and a low degree subset respectively as
in the Alon, Yuster and Zwick work treating each set appropriately. We obtain a
running time
and an approximation (multiplicative error), where is the number
of vertices, the number of edges and the maximum number of
triangles an edge is contained.
Furthermore, we show how this algorithm can be adapted to the semistreaming
model with space usage and a constant number of passes (three) over the graph
stream. We apply our methods in various networks with several millions of edges
and we obtain excellent results. Finally, we propose a random projection based
method for triangle counting and provide a sufficient condition to obtain an
estimate with low variance.Comment: 1) 12 pages 2) To appear in the 7th Workshop on Algorithms and Models
for the Web Graph (WAW 2010
Approximate Counting of Cycles in Streams
Subgraph counting is a fundamental problem in algorithm design and has many applications in data mining, biology, social networks, and many other domains. Over the past years this problem has been studied extensively from a theoretical point of view. Because of the intensive computational resources required, traditional algorithms are infeasible even for medium sized graphs. A natural way to address this problem in a massive graph is to use the data streaming model, where edges arrive in an arbitrary order and the algorithm is required to use limited memory to approximate the number of subgraphs. Prior to our work, most subgraph counting algorithms are based on edge sampling. In this paper we develop a novel approach for counting cycles of an arbitrary but fixed size in the turnstile model, i. e., the input stream is a sequence of edge insertions and deletions. Our algorithm is based on the idea of computing instances of complex-valued random variables over the given stream, and improves drastically upon the naïve sampling algorithms. In contrast to most existing approaches, our algorithm can also be easily applied in the distributed setting. We believe that the idea of using complex-valued random variables will find further applications, in particular with respect to also counting more general subgraphs
Counting arbitrary subgraphs in data streams
Abstract. We study the subgraph counting problem in data streams. We provide the first non-trivial estimator for approximately counting the number of occurrences of an arbitrary subgraph H of constant size in a (large) graph G. Our estimator works in the turnstile model, i.e., can handle both edge-insertions and edge-deletions, and is applicable in a distributed setting. Prior to this work, only for a few non-regular graphs estimators were known in case of edge-insertions, leaving the problem of counting general subgraphs in the turnstile model wide open. We further demonstrate the applicability of our estimator by analyzing its concentration for several graphs H and the case where G is a power law graph
Annotations in Data Streams
The central goal of data stream algorithms is to process massive streams of data using sublinear storage space. Motivated by work in the database community on outsourcing database and data stream processing, we ask whether the space usage of such algorithms be further reduced by enlisting a more powerful “helper ” who can annotate the stream as it is read. We do not wish to blindly trust the helper, so we require that the algorithm be convinced of having computed a correct answer. We show upper bounds that achieve a non-trivial tradeoff between the amount of annotation used and the space required to verify it. We also prove lower bounds on such tradeoffs, often nearly matching the upper bounds, via notions related to Merlin-Arthur communication complexity. Our results cover the classic data stream problems of selection, frequency moments, and fundamental graph problems such as triangle-freeness and connectivity. Our work is also part of a growing trend — including recent studies of multi-pass streaming, read/write streams and randomly ordered streams — of asking more complexity-theoretic questions about data stream processing. It is a recognition that, in addition to practical relevance, the data stream model raises many interesting theoretical questions in its own right.